Statement of Contribution: We split the work and after completion we collaborated together to do and understand the other persons work. Nazli worked on question-1 and Siddhesh worked on question-2.
Caption
Use ggplot2 and the function from step 2 to create a density plot of
Infection risk in which outliers are plotted as a diamond symbol ( â ) .
Make some analysis of this graph.
We can see from the above graph, it resembles the shape of a normal
distribution which majority of the observations around 4.5.There are
only a few outliers i.e 5 outliers to be exact.
Produce graphs of the same kind as in step 3 but for all other
quantitative variables in the data (to iterate over several column
names, you can use .data pronoun, i.e.
aes(x=.data[variable_that_contains_column_name_as_string], âĤ) ). Put
these graphs into one (hint: arrangeGrob() in gridExtra package can be
used) and make some analysis.
We notice outliers for all the variables except for the variable
âAvailable Facilities & Servicesâ(V12). We also see skewness in some
variables like âNumber of Bedsâ (V7), âAverage Daily Censusâ(V10) and
âNumber of Nursesâ(V11).
Create a ggplot2 scatter plot showing the dependence of Infection
risk on the Number of Nurses where the points are colored by Number of
Beds. Is there any interesting information in this plot that was not
visible in the plots in step 4? What do you think is a possible danger
of having such a color scale?
We dont see anything prominent in the above graph to mention any insights.
With regards to the color scale, having such a large color scale can be overwhelming plus having different shades of the colors makes it harder to analyze the graph to easily gain insights from it.
Convert graph from step 3 to Plotly with ggplotly function. What important new functionality have you obtained compared to the graph from step 3? Make some additional analysis of the new graph.
Upon using plotly, we are able to hover over the graph and actually get the corresponding values for various locations on the graph like the outliers, we can hover over those outliers to easily identify its values. Additionally, we can also zoom in to a particular part of the graph we might be interested to analyse further.
Use data plot-pipeline and the pipeline operator to make a histogram of Infection risk in which outliers are plotted as a diamond symbol ( â ) . Make this plot in the Plotly directly (i.e. without using ggplot2 functionality). Hint: select(), filter() and is.element() functions might be useful here.
Write a Shiny app that produces the same kind of plot as in step 4 but in addition include: a. Checkboxes indicating for which variables density plots should be produced b. A slider changing the bandwidth parameter in the density estimation (âbwâ parameter) Comment how the graphs change with varying bandwidth. Is there any bandwidth that is optimal for all of graphs?
The default bandwidth value that was provided is 1. And from the graphs, we can see that as we increase this value , the smoother the graphs becomes.
knitr::opts_chunk$set(
fig.align = "center",
fig.height = 3,
fig.width = 5,
message = FALSE,
warning = FALSE
)
set.seed(12345)
knitr::include_graphics("image-1.pdf")
##### Question-2
#1)
setwd("~/Downloads/M A S T E R S /Sem-3/Part-1/Visualization/labs/lab-1")
data = read.table("SENIC.txt",header = FALSE)
#2)
outlier <- function(variable){
quants <- quantile(variable)
Q1 <- quants[2]
Q3 <- quants[4]
upper <- Q3+1.5*(Q3-Q1)
lower <- Q1-1.5*(Q3-Q1)
value1 <- which(variable > upper)
value2 <- which(variable < lower)
final <- c(value1,value2)
return(final)
}
#3)
library(ggplot2)
ind <- outlier(data$V4)
values <- data$V4[ind]
viz <- ggplot(data = data, aes(x = V4)) + geom_density(kernel = "gaussian") + geom_point(data = data.frame(values), aes(x = values, y = 0),shape = 23, color = "red") + labs( x = "Infection risk", y = "Density")
viz
#4)
library(gridExtra)
col_names <- c("V2","V3","V5","V6","V7","V10","V11","V12")
col_real <- c("Length of Stay ", "Age", "Routine Culturing Ratio", "Routine Chest X-ray Ratio", "Number of Beds ", "Average Daily Census", "Number of Nurses", "Available Facilities & Services")
final<-list()
i <- 1
for(cols in col_names){
ind <- outlier(data[[cols]])
values <- data[[cols]][ind]
out <- ggplot(data = data, aes(x = .data[[cols]])) + geom_density(kernel = "gaussian") + geom_point(data = data.frame(values), aes(x = values, y = 0),shape = 23, color = "red") + labs( x = col_real[i] , y = "Density")
final[[cols]]<-out
i <- i + 1
}
plots <- arrangeGrob(grobs = final, ncol = 5, nrow = 2)
grid.arrange(plots)
#5)
ggplot(data,aes(V4,V11,color = factor(V7))) + geom_point() + labs( x = "Infection risk", y = "Number of Nurses")
#6)
library(plotly)
ggplotly(viz)
#7)
ind <- outlier(data$V4)
values <- data$V4[ind]
quants <- quantile(data$V4)
Q1 <- quants[2]
Q3 <- quants[4]
upper <- Q3+1.5*(Q3-Q1)
lower <- Q1-1.5*(Q3-Q1)
data %>% plot_ly(x= ~V4, type = "histogram") %>% select(V4) %>% filter(V4 > upper | V4 < lower) %>% add_trace(type = "scatter", mode = "markers", marker = list(symbol = "diamond")) %>% layout(xaxis = list(title = 'Infection risk'), showlegend = FALSE)
#8)
library(shiny)
colnames(data) <- c("Identification Number","Length of Stay ", "Age","Infection Risk", "Routine Culturing Ratio", "Routine Chest X-ray Ratio", "Number of Beds ","Medical School Affiliation","Region", "Average Daily Census", "Number of Nurses", "Available Facilities & Services")
ui <- fluidPage(
checkboxGroupInput("variable", "Variables to show:",c("Length of Stay ", "Age","Infection Risk", "Routine Culturing Ratio", "Routine Chest X-ray Ratio", "Number of Beds ", "Average Daily Census", "Number of Nurses", "Available Facilities & Services")),
sliderInput("integer", "BW bandwidth:",min = 0, max = 100,value = 1),
plotOutput("plots")
)
# Define server actions
server <- function(input, output) {
# Use input
# Save updates to output
output$plots <- renderPlot({
final_2<-list()
col_names_2 <- input$variable
for(cols in col_names_2){
ind <- outlier(data[[cols]])
values <- data[[cols]][ind]
out <- ggplot(data = data, aes(x = .data[[cols]])) + geom_density(kernel = "gaussian",bw = input$integer) + geom_point(data = data.frame(values), aes(x = values, y = 0),shape = 23, color = "red") + labs( x = cols , y = "Density")
final_2[[cols]]<-out
}
plots <- arrangeGrob(grobs = final_2, ncol = 5, nrow = 2)
#print(final_2)
#print(input$variable)
#grid.arrange(plots)
plot(plots)
#print(plots)
})
}
# Run the application
shinyApp(ui = ui, server = server)